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Genome-wide analysis of HPV integration in human 
cancers reveals recurrent, focal genomic instability 
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Genomic instability is a hallmark of human cancers, including the 5% caused by human papillomavirus [HPV). Here we 
report a striking association between HPV integration and adjacent host genomic structural variation in human cancer 
cell lines and primary tumors. Whole-genome sequencing revealed HPV integrants flanking and bridging extensive host 
genomic amplifications and rearrangements, including deletions, inversions, and chromosomal translocations. We present 
a model of "looping" by which HPV integrant-mediated DNA replication and recombination may result in viral-host 
DNA concatemers, frequently disrupting genes involved in oncogenesis and amplifying HPV oncogenes E6 and E7. Our 
high-resolution results shed new light on a catastrophic process, distinct from chromothripsis and other mutational 
processes, by which HPV directly promotes genomic instability. 



[Supplemental material is available for this article.] 

Human papillomavirus (HPV) causes —610,000 cancers worldwide 
each year (Forman et al. 2012). Landmark studies established 
that HPV is the principal cause of virtually all cervical cancers 
(Walboomers et al. 1999) and subsets of other anogenital and head 
and neck cancers (Gillison et al. 2000; IARC Working Group on the 
Evaluation of Carcinogenic Risks to Humans 2007). The incidence 
of HPV-associated head and neck squamous cell carcinoma 
(HNSCC) in particular is increasing at an alarming rate (Chaturvedi 
et al. 2011). The transforming ability of oncogenic HPV types has 
been attributed to two viral oncoproteins, E6 and E7, which in- 
activate p53 and members of the pRb family (Moody and Laimins 
2010), respectively. E6 and E7 expression is essential to the viral life 
cycle, as HPV coopts the cellular DNA replication machinery. Al- 
though their expression is sufficient for cellular immortalization in 
vitro (Hawley-Nelson et al. 1989; Munger et al. 1989), secondary 
genetic events are necessary for cancer development in vivo 
(Moody and Laimins 2010). 

Early upon infection, HPV genomes replicate as extra-chro- 
mosomal elements in the nucleus. Viral integration frequency 
increases with severity of cervical precancers, and viral integrants 

"These authors contributed equally to this work. 
"These authors contributed equally to this work. 
1 Corresponding authors 
E-mail david.symer@osumc.edu 
E-mail maura.gillison@osumc.edu 

Article published online before print. Article, supplemental material, and publi- 
cation date are at http://vwvw.genome.org/cgi/doi/10.1101/gr.164806.113. 



are present in the majority of cervical cancers (Wentzensen et al. 
2004; Xu et al. 2013). Integration imparts both a selective growth 
advantage and genomic instability of the infected cell (Pett et al. 
2004) through enhanced expression and stabilization of viral on- 
cogene transcripts (Jeon and Lambert 1995). Increased transcrip- 
tion has been attributed to frequent disruption of the viral re- 
pressor HPV E2 (Thierry and Yaniv 1987). HPV integrants in 
cervical cancers have been statistically associated with regional 
structural abnormalities (Peter et al. 2010), but the relationship of 
the virus to such variants, their detailed genomic structures, and 
their functional significance remain largely unknown. 

To study the impact of HPV integration on host genomic 
structure and function, we performed whole-genome sequencing 
(WGS), RNA-seq, spectral karyotyping (SKY), fluorescence in situ 
hybridization (FISH), and other molecular assays. We chose 10 
HPV-positive and -negative human cancer cell lines because they 
provided relatively unlimited and experimentally tractable ge- 
nome samples representing cervical and head and neck cancers. 
Our comprehensive, genome-wide analysis, performed at single- 
nucleotide resolution, identified a striking and recurrent association 
between HPV integrants and adjacent genomic amplifications and 
rearrangements. WGS also confirmed that similar HPV-associated 
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structural variations occur frequently in primary, HPV-positive 
head and neck cancers. 

Results 

We conducted WGS of 2 X 100-bp paired-end libraries from 10 
cancer cell lines (two cervical and eight head and neck) and two 
HPV16/18-positive primary HNSCC (tonsillar and oral cavity), 
using the HiSeq 2000 platform (Illumina Genome Network). The 
cell lines were established from five HPV16-positive (HMS001, 
UM-SCC-104, UD-SCC-2, UM-SCC-47, UPCI:SCC090) and three 
HPV-negative (CAL 27, D562, SCC-25) HNSCC samples, and two 
HPV16-positive (SiHa, CaSki) cervical cancers (Table 1). 

Variability in HPV genomes and transcriptomes 

Significant numbers of WGS reads aligned to the reference viral 
genome in HPV-positive (Fig. 1A) but not HPV-negative cases. Viral 
copy number estimates ranged from zero to 830 per cell, using 
sequence depths of coverage in autosomes as a guide. Comparable 
viral copy number estimates were obtained by quantitative PCR 
(qPCR) assays of the viral E6 gene, normalized against a control 
(Table 1). 

Numerous sample-specific single-nucleotide polymorphisms 
(SNPs; n = 24-119) were identified in homozygosity in viral se- 
quences, confirming a clonal relationship between virus and tu- 
mor (Supplemental Tables 1, 2; Gillison et al. 2000). The depth of 
sequence coverage across the ~8-kb viral genome showed con- 
siderable variability (Fig. 1A). Uniform viral coverage was found in 
Tumor B, consistent with a predominance of viral episomes. In 
contrast, the other eight HPV-positive samples had nonuniform 
coverage or were missing portions of the viral genome, prompting 
us to search for insertional breakpoints, defined as the junctions 
between viral and host genome sequences. Breakpoint calls were 
based on two or more discordant paired-end reads aligned to the 
human reference genome within a 500-bp interval. A total of 111 
unique insertional breakpoints were identified in the nine HPV- 
positive cases; their frequencies ranged from two to 47 per sample. 
Targeted PCR amplification and Sanger sequencing were used to 
confirm the vast majority (95%; 105 of 1 11) (Supplemental Table 3). 



Collectively, the insertional breakpoints mapped broadly 
across the HPV genome (Fig. IB). In several cases (e.g., SiHa, 
HMS001, UM-SCC-104, and UM-SCC-47 cells), they flanked 
missing viral segments, indicating loss at integration. Portions of 
the E2 gene were missing in four of nine cases (Fig. 1A; Choo et al. 
1987). Rearrangements within the viral genome also contributed 
to nonuniform viral coverage in UD-SCC-2, UPCI:SCC090, CaSki, 
and Tumor A (Supplemental Fig. 1). Despite wide-ranging viral 
genomic fragmentation, viral fragments containing E6 and E7 
oncogenes were retained in all samples. 

Since detection of HPV E6 and E7 transcripts remains the gold 
standard for defining cancers caused by HPV (Gillison and Shah 
2003), we assessed viral gene expression using RNA-seq in all cases 
except SiHa (evaluated only by RACE). All virus-positive cases 
displayed strong but variable expression of HPV transcripts (ranging 
from 11.1 to 533.8 fragments per kilobase of transcript per million 
reads mapped, fpkm) (Supplemental Fig. 2). Canonical, spliced viral 
transcripts originating from the early p97 promoter, with coding 
potential for E6*I and E7, were found in all cases. 

HPV and the cellular genome 

WGS data revealed millions of SNPs and insertion/deletion (indel) 
polymorphisms in each host cell genome sample (Supplemental 
Fig. 3). We confirmed an inverse association between HPV status 
and inactivating mutations in TP53 and CDKN2A (Fisher's exact, 
P = 0.004) (Gillison et al. 2000; Westra et al. 2008). Mutations, de- 
letions, or chromosomal translocations also were detected in genes 
frequently mutated in HNSCC, including PTEN, PIK3CA, PIK3API, 
NOTCH1, TP63, SYNE1, SYNE2, and CASP8 (Agrawal et al. 2011; 
Stransky et al. 2011). In UM-SCC-104 cells, homozygous deletions 
in NOTCH1 resulted in its predicted frameshift and premature 
truncation (Supplemental Table 4). Recurrent chromosomal gains 
and losses included 5p (gain) and 3p26 (loss) (Heselmeyer et al. 
1997) found in six of seven HPV-positive HNSCC samples (Sup- 
plemental Figs. 4, 5). 

Focal clustering of HPV integrants 

Collectively, the HPV integrants mapped broadly across the hu- 
man genome; we obtained no evidence for recurrent integration in 



Table 1. Summary of 12 cancer samples analyzed in this study: clinical origin, HPV viral status, and references 

Viral copy Viral copy Number of 

Sample Anatomical HPV Viral number, number, detected viral 

name Gender site of origin status variant 3 real-time PCR b WGS C breakpoints, WGS d Cell line reference 



SiHa 


F 


Cervix 


HPV16 


EUR 


0.397 


1.5 


2 


Friedl et al. 1970 


CaSki 


F 


Cervix 


HPV16 


EUR 


122 


831.6 


47 


Pattillo etal. 1977 


UM-SCC-104 


M 


Oral cavity 


HPV16 


EUR 


2.86 


1.1 


2 


Tang etal. 2012 


UD-SCC-2 


M 


Hypopharynx 


HPV16 


EUR 


14.3 


23.4 


7 


Ballo etal. 1999 


UM-SCC-47 


M 


Tongue 


HPV16 


AFR2a 


21.1 


47.0 


6 


Lansford et al. 1999 


UPCI:SCC090 


M 


Tongue 


HPV16 


EUR 


182.0 


483.0 


33 


White et al. 2007 


HMS001 


M 


Tonsillar fossa 


HPV16 


ASN 


0.81 


1.0 


2 


This study 


Tumor A 


M 


Oral cavity 


HPV18 




41.9 


14.3 


10 


Tumor B 


M 


Tonsil 


HPV16 


EUR 


5.7 


11.0 


2 




CAL 27 


M 


Tongue 


Negative 


N/A 


0 


0 


N/A 


Gioanni etal. 1988 


D562 


F 


Pharynx 


Negative 


N/A 


0 


0 


N/A 


Peterson et al. 1971 


SCC-25 


M 


Tongue 


Negative 


N/A 


0 


0 


N/A 


Rheinwald and Beckett 1981 



a HPV variant: (EUR) European; (ASN) Asian; (AFR) African. Nomenclature is based on variants in E6 and LCR regions by the IARC HPV Variant Study Group 
(Cornet etal. 2012). 

b Viral copy number as measured by quantitative real-time PCR. Copy number reflects the ratio of HPV1 6 E6 to ERV3. 
c Viral copy number as determined by whole-genome sequencing (WGS). 
d Viral— cellular breakpoints detected by whole-genome sequencing (WGS). 
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Figure 1. The HPV virome in human cancers. WGS reads were aligned to the HPV1 6 (x-axis coordinates, bottom) or HPV1 8 (Tumor A only) reference 
genomes. (A) In each row, histograms for various cancer samples indicate copy numbers of viral sequences (blue) and counts of discordant paired-end 
reads supporting insertional (red), intraviral (yellow), or both (orange) breakpoints. The scale (/-axis) of each plot was normalized to maximum read 
counts (upper left, each row). HPV breakpoints are defined further in Supplemental Figure 1 and Supplemental Table 3. (B) Verification of insertional 
breakpoints. Counts of discordant paired-end reads supporting each breakpoint (displayed both on the /-axis and by dot sizes), identified in any of the 
samples studied, are shown according to position in HPV16 genome (x-axis). (Black dots) Confirmed by PCR and Sanger sequencing; (red) failed PCR; 
(blue) not tested; (LCR) long control region; (OR) origin of replication. See also Supplemental Figure 2 and Supplemental Tables 3 and 4. 



chromosomal hotspots. However, we observed HPV insertional 
breakpoints to cluster at specific chromosomal regions in different 
cell lines (Fig. 2; Supplemental Fig. 5). Genomic distances between 
viral integrants in a "cluster/' defined by three or more unique 
insertional breakpoints, spanned up to 3 Mb (Tumor A). As ex- 
amples, HPV integrant clusters mapped to chromosome Xq21 in 
UD-SCC-2 cells, chromosome 3pll in UM-SCC-47, and chromo- 
somes 3pl2, 6p21, and 9q22 in UPCI:SCC090 cells (Fig. 2). 

To examine HPV integrant clusters further, we performed 
both fluorescence in situ hybridization (FISH) using HPV16-spe- 
cific probes and spectral karyotyping (SKY) on all cell lines. Several 
observations were made. Chromosomal locations of the majority 
of integrant clusters identified by WGS were confirmed (Fig. 2; 
Supplemental Fig. 6), but low-copy-number integrants were not 
detected (e.g., HMS001 and UM-SCC-104). HPV FISH signals were 
found in identical cytogenetic regions of duplicate chromosomal 



copies, but with variable intensities, suggesting that integration 
preceded chromosomal duplication and subsequent divergence in 
viral copy numbers. Chromosomal paints and HPV FISH signals 
routinely overlapped, consistent with alternating viral and host 
genomic sequences. Combined FISH and SKY analysis also fre- 
quently identified HPV integrants adjacent to chromosomal 
translocations. Although these juxtapositions also were observed 
from WGS data, in several instances, the physical distance between 
HPV insertions and translocations exceeded 1 Mb. 

HPV insertions flank copy number variations (CNVs) 

HPV insertional breakpoints detected by WGS were observed to be 
less numerous than viral copy numbers in some cases (Table 1). We 
hypothesized that this discrepancy was due to amplification of 
viral integrants and flanking genomic sequences, leading to 
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Figure 2. HPV integration sites are clustered. HPV insertional breakpoints (dots), identified from discordant paired-end WGS reads, were mapped to 
human chromosomes (karyotypes, center). FISH and SKY images of chromosomes from various cancer cell lines (background colors, see key) are shown to 
the left and right of the karyotypes. Each set of FISH/SKY images includes (left) spectral karyotypes, (middle) chromosome banding patterns with HPV 
insertion sites (pink or teal fluorescent signals), and (right) pseudo-colored cartoons depicting chromosomal composition including translocations. The 
color key indicates the sample source. See also Supplemental Figures 5 and 6. 



redundant, identical breakpoints (Fig. 3). Therefore, we examined 
WGS coverage across chromosomal regions containing HPV inte- 
grants. We observed extensive, focal genomic CNVs at these in- 
tegration sites in eight of nine HPV-positive cases. Viral integrants 
(Fig. 3, red histograms) resided directly adjacent to or within 
sharply demarcated fluctuations in depth of WGS coverage. HPV 
integrants were found at regions of amplification (blue histo- 
grams), ranging from an —1.5 -fold increase in HMS001 to an 
~58-fold increase in UPCI:SCC090 cells (Supplemental Figs. 7, 8; 
Supplemental Table 5). Viral integrants also flanked regions wherein 
deletions occurred (Fig. 3, yellow), ranging from 487 bp in HMS001 
(Fig. 4B) up to 234 kb in chromosome 3 of UPCI:SCC090 (data not 
shown). Only UM-SCC-104 cells lacked CNVs flanking a lone HPV 
integrant (Fig. 3G). We confirmed regions of amplification by 
quantitative PCR assays (Supplemental Fig. 8) and regions of de- 
letion by spanning PCR and Sanger sequencing (data not shown). 

Statistical analysis confirmed a strong association between 
HPV insertional breakpoints and genomic structural variation in 
eight of nine cases, including chromosomal translocations, de- 



letions, inversions, and/or intrachromosomal rearrangements 
(Bonferroni adjusted binomial test, all P < 10~ 9 ) (Supplemental 
Fig. 7). Moreover, such enrichment occurred at focal hyper- 
amplification sites in six of nine samples (local ploidy >8N, 
Bonferroni adjusted binomial test, all P < 10~ 10 ). This statistical as- 
sociation between HPV integrants and focal hyperamplifications 
(>8N) was not observed in samples harboring low-copy-number 
or predominantly unintegrated viral genomes (i.e., UM-SCC-104, 
HMS001, and Tumor B). 

Resolving focal genomic amplifications and rearrangements 

We sought to elucidate the actual linear DNA structures flanking HPV 
insertional sites. Guided by the breakpoints and CNVs called from 
WGS data (Fig. 3), we performed targeted PCR, Sanger sequencing, 
and chromosome walking to develop connectivity maps (Figs. 4-6). 

A recurrent pattern of focal amplifications and rearrange- 
ments adjacent to HPV integration sites was seen in six of seven 
HPV-positive cell lines (Figs. 4, 5). In SiHa cells, two insertional 
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Figure 3. Focal CNVs and breakpoints adjacent to HPV insertions. Histograms showing the depth of WGS coverage (/-axis) of well-aligned (top; yellow, 
blue, and green) and breakpoint sequence reads {bottom; red and gray), mapped to the reference human genome (hgl 9) at indicated chromosomal loci 
(x-axis, gene schematics, coordinates in megabases). (Top) Focal CNVs and (bottom) HPV insertional (red) or host-host (gray) breakpoints are shown for 
cancer samples (A) SiHa; (B) HMS001; (C) UM-SCC-47; (D) UD-SCC2; (£) UPCI:SCC090, chromosome 9; (0 UPCI:SCC090, chromosome 6; and (C) UM- 
SCC-104. (Top, /-axis labels) The depth of WGS coverage is shown for cancer samples (blue); normal control (yellow); overlapping, shared coverage 
regions (green). (Bottom) Maximal breakpoint read counts (black) and breakpoint ID numbers (HPV bk, red). 
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Figure 4. Focal amplifications and rearrangements explained by "looping" model. Schematics of HPV target sites before (top) and after (middle) viral 
integration in (A) SiHa and (B) HMS001 cells. We defined genomic segments (marked with capital letters and grayscale fills) based on HPV insertional (red 
lines) or host-host breakpoints (gray lines). (Top, white rectangles and horizontal lines) Gene exon structures; (arcing clotted lines, arrows: red) con- 
nections between HPV insertional breakpoints; (black) host-host breakpoints joining discontinuous reference genome sequences; (light blue histograms) 
depth of WGS coverage depicting CNVs (see Fig. 3). (Middle: direction of large red arrows) Relative orientation of sense strand of HPV reference genome, 
not drawn to scale; (white numbers and gaps) viral breakpoint ID numbers and rearrangements; (blue bars) confirmatory PCR amplicons and Sanger 
sequencing. (Bottom) Inferred, transient looping models to explain formation of CNVs and observed connections between breakpoints. (C) A generalized, 
stepwise looping model depicting HPV-associated structural variation at YFG (your favorite gene, top) as a target, inferred target nicking and integration of 
linear HPV genome (red), transient formation of circular DNA containing viral sequences, rolling circle amplification of this template (gray circular arrows), 
and formation of concatemers harboring identical viral-host and host-host breakpoints. The resulting observed focal amplifications and rearrangements, 
adjacent to HPV integrants, disrupt expression of YFG (bottom). 
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Figure 5. Connectivity maps of complex, HPV-associated genomic structural variation in HNSCC cell lines. Schematics of genomic target loci before 
(top) and after (bottom) HPV integration in HNSCC cell lines (A) UM-SCC-47 and (B) UPCI:SCC090, chromosome 6. Target gene schematic and con- 
nectivity map features are as described in the Figure 4 legend. HPV breakpoint numbers are listed in Supplemental Table 3. These schematics including the 
viral insertions are not drawn to scale. A short inversion in the viral sequence flanking breakpoint 3 is not shown. (Parentheses) Genomic segments with 
indicated fold amplification (x N) calculated from WGS data; (upside-down letters) inverted segments. See Supplemental Figure 9 for connectivity maps 
for both UD-SCC-2 and UPCI:SCC090, chromosome 9. 



breakpoints directly flanked a twice-amplified, 300-kb segment 
(Figs. 3 A, 4 A, red lines). Targeted Sanger sequencing confirmed 
these two breakpoints to be connected by 7 kb of HPV16 ge- 
nomic sequence, despite being mapped —300 kb apart in the 
reference genome (Fig. 4A, red lines). This HPV-mediated re- 
arrangement occurred twice, concatemerizing large, flanking 
genomic segments head-to-tail and resulting in two HPV inte- 
grants bounded by identical breakpoints. In one of the duplicated 
segments, a secondary rearrangement resulted in an ~72-kb in- 
ternal deletion. 

In HMS001 cells, two viral insertional breakpoints delineated 
an HPV16 integrant that spanned a small genomic deletion (Fig. 
4B). Adjacent host genome fragments and viral sequences were 
duplicated in tandem, resulting in two identical HPV integrants 
and an intrachromosomal rearrangement. 

These relatively simple, resolved linear genome structures ad- 
jacent to HPV integration sites prompted us to construct a "looping" 
model to explain their occurrence (Fig. 4C). We hypothesized that 
HPV integrants could bridge noncontiguous genomic sequences, 
perhaps by connecting nicked sites. We also surmised that replica- 
tion of transient, circularized genomic segments would result in 
focal amplification of breakpoints involving viral and flanking host 
genomic sequences. Subsequently, distal free ends of amplified 
strands could recombine with noncontiguous sequences and un- 
dergo repair, resulting in concatemerized linear structures bounded 
by recurrent, identical breakpoints (Fig. 4C). 

Remarkably consistent patterns of concatemerized host-virus 
sequences were identified in all but one of the HPV-positive cell 
lines (Figs. 3-6). The resolved, linear DNA structures at HPV inte- 
grant clusters in cell lines UM-SCC-47 and UD-SCC-2, and chro- 
mosomes 6 and 9 of UPCI:SCC090 were considerably more com- 
plex than in SiHa and HMS001 cells (Figs. 4, 5; Supplemental 
Fig. 9). Nevertheless, genomic amplifications and rearrangements 
connected by identical HPV integrants (red lines) and/or host-host 
recombinant breakpoints (gray lines) were found in each sample. 
HPV-associated genomic deletions (UD-SCC-2) and inversions 
(UM-SCC-47, UD-SCC-2, UPCI:SCC090) also were identified (Fig. 5; 



Supplemental Fig. 9). Notably, the viral E6 and E7 oncogenes 
were amplified within the viral-host concatemers in most samples 
studied. 

For example, at the TP63 locus in UM-SCC-47 cells (Figs. 3C, 
5A), we observed five insertional breakpoints to demarcate four 
unique HPV integrants. Consistent with our looping model, a viral 
integrant bridged discontinuous host sequences (Fig. 5 A, segments 
B and F) that were amplified, resulting in a total of —25 HPV 
integrants in a concatenated array. 

In UD-SCC-2 cells, HPV-mediated genomic amplifications 
flanked a >100-kb deletion (Fig. 3D; Supplemental Fig. 9A). Certain 
exons of the DIAPH2 gene were amplified (e.g., segments K-S), 
while others were deleted (segment J). 

In UPCI:SCC090 cells, 15 HPV insertional breakpoints were 
clustered on chromosome 9 (Fig. 3E; Supplemental Fig. 9), and five 
were clustered on chromosome 6 (Figs. 3F, 5B). Several integrants 
on chromosome 9 shared an identical 5' breakpoint but were 
connected to unique 3' breakpoints (Supplemental Fig. 9B). The 
shared 5' breakpoints again were consistent with our looping 
model (Fig. 4C), whereby the 3' end of a primary HPV integrant 
subsequently could recombine at multiple upstream genomic sites. 
We also observed >90-fold amplification of host DNA bridged by 
another HPV integrant. These results confirmed that looping could 
occur at more than one chromosome in the same cell. Similar am- 
plifications, deletions, and inversions were present in numerous 
HPV integrant clusters in CaSki cells (Supplemental Figs. 5, 6). 

To exclude the possibility that the observed associations be- 
tween HPV integrants and flanking genomic variants (Figs. 3-5) 
were merely artifacts of cell culture, we also evaluated two primary 
human tumors by WGS (Fig. 6; Table 1; Supplemental Table 1). In 
Tumor A, an insertional breakpoint cluster on chromosome 5 in- 
cluded HPV-18-associated deletions, inversions, and amplifica- 
tions (Fig. 6A-C). The involved locus spanned ~2 Mb in length 
and disrupted genomic structures of several genes including 
ACTBL2 and a candidate oncogene, PDE4D (Lin et al. 2013). 

In Tumor B, which harbored predominantly episomal HPV, 
WGS demonstrated that a small fragment of HPV16 integrated 
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directly into the midst of a chromosomal translocation, juxtaposed 
between chromosome llql3 (directly involving CCND1, encod- 
ing cyclin Dl) and chromosome 8pll (Fig. 6D-F). We found sev- 
eral insertional and host-host breakpoints spanning >2-Mb segments 
on chromosome 11 and almost 5 Mb on chromosome 8, supporting 
an interwoven series of connections between the chromosomes that 
again is consistent with our looping model. Based on WGS sequence 
coverage and estimated chromosomal ploidies, we infer that HPV 
sequences connected segments of two chromosomes, resulting in an 
apparent dicentric translocation t(ll;8) together with monosomic 
chromosomes 8 and 11 (Fig. 6E,F). 

Importantly, preliminary analysis of WGS data from 21 ad- 
ditional HPV-positive HNSCC primary tumors indicated that ap- 
proximately half possessed CNVs and rearrangements flanked di- 
rectly by HPV insertional breakpoints (data not shown). 

HPV integrants disrupt cellular genes by multiple mechanisms 

The possibility that HPV integrants could be enriched within or 
near cellular genes was evaluated in the nine HPV-positive cell 
lines and tumors. Confirmed breakpoints mapped 5' (12.6%) or 3' 
(15.3%) to genes, within exons (2.7%), within introns (39.6%), or 
in intergenic loci (29.7%), using NCBI RefSeq genes as a reference. 
We observed a modest enrichment of HPV integrants within 50 kb 
of RefSeq genes (binomial test, P-value = 0.063), within genomic 
fragile sites (binomial test, P-value = 0.02), or within DNase I hy- 
persensitivity sites (binomial test, P-value = 0.003) when all inte- 
grants were considered. However, these associations were no lon- 
ger significant after accounting for interdependence of integrant 
clusters (Supplemental Table 6). 

Gene expression analysis by RNA-seq and 5' and 3' RACE 
experiments revealed expression of virus-host fusion transcripts in 
all cell lines and Tumor A. Transcripts initiated from the viral early 
region promoters (p97, p670) were fused directly with or were 
alternatively spliced with genomic templates. In SiHa, UPCI:SCC090, 
and UD-SCC-2 cells, viral fusion transcripts were expressed from 
more than one integrant within a cluster. Importantly, transcript 
structures frequently confirmed genomic rearrangements found by 
WGS and Sanger sequencing (Fig. 7). 

Many insertional breakpoints flanked or disrupted genes in- 
volved in carcinogenesis (Supplemental Table 7). For example, an 
HPV integrant in UD-SCC-2 cells deleted and rearranged portions 
of DIAPH2 (Fig. 7A-E), which encodes a protein involved in fidelity 
of sister-chromatid separation (DeWard et al. 2010; Cheng et al. 
2011). Viral-host fusion transcripts flanking the deletion were 
expressed strongly, while native DIAPH2 transcripts were not. The 
protein encoded by DIAPH2, i.e., diaphanous-related formin 2, 
was absent from the UD-SCC-2 cell line, as shown by Western blot 
(Fig. 7D). 

In a different form of gene disruption, HPV-mediated gene 
amplification in UM-SCC-47 cells resulted in aberrant expression 
of TP63, a key regulator of epithelial differentiation (Fig. 7F-I; 
Bergholz and Xiao 2012). Several distinct viral-host fusion tran- 
scripts and a novel, truncated 25-kDa protein at the carboxyl ter- 
minus of TP63 were expressed. The truncated protein includes the 
TID domain, a dominant-negative regulator of the pro-apoptotic 
protein TAp63 (Serber et al. 2002). 

Additional examples of gene disruption were found (Supple- 
mental Figs. 10, 11). These included amplification of the onco- 
genes FOXE1 (Katoh et al. 2013) and PIM1 (Narlik-Grassow et al. 
2013) in UPCI:SCC090 cells, resulting from HPV-associated looped 
structures. We also found "gene breaking" (Wheelan et al. 2005) by 



an HPV integrant in SLC47A2, a solute carrier, in UM-SCC-104 
cells, caused by premature termination of transcripts at the HPV E5 
polyadenylation site. 

Discussion 

Genomic instability is a critical determinant of virus-induced car- 
cinogenesis, and indeed of virtually all human cancers (Hanahan 
and Weinberg 2000). In the case of HPV, such instability has been 
attributed largely to functions of the viral E6 and E7 oncogenes 
(Moody and Laimins 2010). Here, at single-nucleotide resolution, 
we mapped HPV integrants in human cancer genomes and showed 
that they directly flank genomic structural variations, including 
focal amplifications, rearrangements, deletions, and/or trans- 
locations. These genomic alterations frequently disrupted the ex- 
pression and structure of neighboring genes involved in onco- 
genesis, and resulted in amplification and expression of E6 and E7. 
Our results shed new light on these potentially critical, cata- 
strophic steps in the progression from an initial viral infection to 
development of an HPV-associated cancer. 

Previous studies, lacking high-resolution WGS data, reported 
HPV breakpoints preferentially in the viral E2 gene (Thierry and 
Yaniv 1987). Recently a new method based on targeted amplifi- 
cation of particular HPV sequences together with flanking geno- 
mic sequences was described (Xu et al. 2013). However, although 
we also observed recurrent disruptions of E2, viral breakpoints 
were collectively distributed more broadly across the viral genome 
(Fig. 1). Despite frequent fragmentation of HPV integrants, the E6 
and E7 viral oncogenes were retained and expressed as viral-host 
fusion transcripts in all virus-positive samples (Fig. 1; Supple- 
mental Fig. 2; Lace et al. 2011; Khoury et al. 2013). 

Prior data, based on SNP arrays, suggested that approximately 
one-third of HPV integrants in cervical cancers were statistically 
associated with regional CNVs (Peter et al. 2010). In contrast, our 
identification of a recurrent pattern of extensive, focal CNVs and/ 
or other forms of structural variation immediately adjacent to HPV 
integrants, all mapped at single-nucleotide resolution, highlighted 
the powerful capabilities of WGS in the current era of personalized, 
genomic medicine (Figs. 3-6). Use of cell lines facilitated FISH and 
SKY experiments (Fig. 2), which corroborated these viral insertion 
sites and their proximity near sites of structural variation. Both 
targeted DNA sequencing and viral-host fusion transcript struc- 
tures also confirmed such structural variations including con- 
catemerization of viral and host sequences (Figs. 4-7). 

Our inclusion of primary tumors here verified that HPV-as- 
sociated genomic variation is not a mere artifact of cell culture. 
Moreover, our WGS analysis of numerous additional primary HPV- 
positive HNSCC specimens independently confirmed that HPV- 
associated amplifications and rearrangements were present in ap- 
proximately half of them (data not shown). Moreover, the recently 
sequenced HeLa genome also revealed multiple HPV- 18 integrants 
(with one HPV-32) clustered in an ~10-kb region on chromosome 
8, again flanked by focal structural variants (Adey et al. 2013; 
Landry et al. 2013). 

The recurrent patterns of focal genomic instability adjacent to 
HPV integrants, observed across diverse cancer samples arising in 
different tissues, led us to infer a possible looping model to explain 
their formation (Fig. 4C). Organized concatemers composed of 
viral and host sequences, adjoined by identical HPV-breakpoint 
sequences, may have accumulated as a result of DNA replication 
originating from the viral origin of replication. This model is 
compatible with previous experiments in which viral El and E2 
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proteins were overexpressed, inducing replication of extant HPV 
integrant sequences in HeLa and SiHa cells (Kadaja et al. 2007; 
Peter et al. 2010). However, the relatively low-resolution Southern 
blots and SNP arrays employed in those studies did not fully elu- 
cidate the resulting structural variants. 

Host DNA polymerases and transcription factors are involved 
in HPV replication, which can occur either bi-directionally or as 
a rolling circle (Flores and Lambert 1997). Our model is particularly 
compatible with the latter. Transient formation of looped, circular 
structures composed of HPV integrants and adjacent chromosomal 
sequences would facilitate their replication as a rolling circle (Fig. 
4C). Subsequent repair of linear structures consisting of virus- 
host concatemers would lead to the resulting, observed intra- 
chromosomal amplifications and rearrangements (breakpoints) 
(Fig. 2). We note that virtually all identified looped structures 
contained the HPV origin of replication (OR) (Fig. 1). In the viral 
life cycle, HPV replication is initiated by E2 binding and re- 
cruitment of the El helicase to the OR (McBride 2008). It also 
depends on activation of the DNA damage response by ATM 
(Gillespie et al. 2012) and ATR (Reinson et al. 2013). Therefore, 
looped structures and associated rearrangements could be by- 
products of aberrant HPV integrant replication. 

Looping as a proposed mechanism for focal genomic in- 
stability bears some similarities with other mechanisms, but also is 
distinct from each. In chromothripsis, seen in 2%-3% of human 
cancers, the observed chromosomal fragmentation is focal but 
disorganized. Chromothripsis breakpoints appear juxtaposed in 
a random pattern (Crasta et al. 2012; Forment et al. 2012; Molenaar 
et al. 2012; Rausch et al. 2012; Korbel and Campbell 2013) in 
contrast with the organized concatemers described here. However, 
both mechanisms maybe replication-dependent (Liu et al. 2011). 
Hallmarks of chromothripsis recently were identified in HeLa cells, 
but not at the HPV integration sites (Landry et al. 2013). In 
breakage-fusion-bridge cycles, palindromic sequences are thought 
to trigger focal amplification and regional genomic instability 
(Tanaka and Yao 2009; Hillmer et al. 2011). However, we did not 
observe palindromes at intrachromosomal or HPV insertional 
breakpoints. LI retrotransposition can induce forms of structural 
variation including deletions, inversions, and chromosomal trans- 
locations, but focal amplifications have not been associated with it 
(Symer et al. 2002). The formation of other breakpoints in cancers 
may occur via nonhomologous end joining (Chiang et al. 2012; 
Malhotra et al. 2013). The small numbers of HPV insertional 
breakpoints detected by WGS, even in the setting of extraordinarily 
high HPV copy numbers, argue effectively against disorganized or 
"random" breakpoints, as might be expected in aneuploidy, non- 
homologous end joining or chromothripsis. 

A temporal relationship between HPV integration, enhanced 
expression of E7, and the subsequent development of quantitative 
and structural chromosomal abnormalities has been observed in 
cell culture (Pett et al. 2004). Increased E7 expression confers a se- 
lective growth advantage (Jeon and Lambert 1995); thus we con- 
clude that cells bearing HPV integrant clusters, with amplified viral 
genomes and transcriptionally active viral promoters, would be 
clonally selected. Deregulated expression of HPV E6 and E7 
oncoproteins is a sine qua non of HPV-associated malignancies. 

Prior evidence supporting HPV as an insertional mutagen has 
been scant. In a few reports, viral integration was associated with 
loss of heterozygosity of tumor suppressor genes (Reuter et al. 
1998; Schmitz et al. 2012). HPV integrants near MYC have been 
associated with increased Myc expression, possibly due to viral 
enhancer sequences introduced nearby (Peter et al. 2006). Here we 



identified multiple means by which HPV integrants can disrupt 
neighboring genes, including gene rearrangements (DIAPH2 
complex rearrangements in UD-SCC-2), introduction of aber- 
rant promoters (TP63 in UM-SCC-47), and gene amplification 
(PIM1, FOXE1, C9orfl56 in UPCLSCC090). Even in UM-SCC- 
104 cells, where a lone HPV integrant was not associated with 
CNV (Fig. 3G), we nevertheless observed gene breaking (Sup- 
plemental Fig. 1 1; Wheelan et al. 2005). Such gene disruption by 
HPV integrants is likely of biological significance, given the 
established functions of affected genes. For example, we ob- 
served homozygous deletion of DIAPH2 in UD-SCC-2 cells. 
Mutations of this gene promote chromosomal instability via 
misalignment of sister chromatids during metaphase (Cheng 
et al. 2011). In UM-SCC-47 cells, the HPV integrant cluster caused 
aberrant expression of TP63, a critical regulator of epithelial dif- 
ferentiation. The highly expressed, truncated form of p63 protein 
contained a domain known to inhibit the pro-apoptotic protein, 
TAp63 (Serber et al. 2002). 

We note some limitations to our analysis. A majority of 
samples were cultured cells, chosen intentionally to allow more 
extensive experimentation (e.g., Figs. 2, 7) than would be possible 
with primary tumors. However, since exclusively episomal HPV 
may be lost in the derivation of cell lines from tumors, our data 
may overestimate the proportion of cancers bearing HPV inte- 
grants and looping events. Because the insert size of our WGS li- 
braries was —300 nucleotides (nt), we performed extensive addi- 
tional PCR, cloning, and sequence analysis to confirm connections 
within looped structures. Although we are confident in the exis- 
tence of these structures, in complex cases spanning kilobases or 
megabases in sequence, the exact order of recombined genomic 
segments could not be resolved unambiguously using conven- 
tional paired-end libraries. Although we currently are conducting 
long-range sequencing, even these new technologies will be un- 
able to resolve repetitive structures on a megabase scale. We cannot 
entirely exclude alternative sources of variability in copy numbers 
or insertional breakpoints, such as the simultaneous presence of 
different subclones in cell populations. Also, we were unable to 
map insertional breakpoints unambiguously at certain repetitive 
elements. To test the hypothesis that insertional mutagenesis and 
looping by HPV may contribute causally to cancer formation, by 
generating driver mutations rather than passenger or bystander 
mutations, currently we are engineering genetic knock-ins and 
knockouts in the cell lines studied. 

The looping model proposed here provides a framework to 
understand how HPV integrants could become flanked by neigh- 
boring CNVs in human cancers. Intriguing recent data showed 
that hepatitis B virus (HBV) integrants in hepatocellular carcinoma 
similarly can be associated with regional CNVs, viral-host fusion 
transcripts and alterations in host genome expression Qiang et al. 
2012; Sung et al. 2012; Toh et al. 2013), analogous to our findings 
on HPV integrants. However, although CNVs were identified near 
HBV integrants in —25% of cases, they and other structural vari- 
ants were not fully resolved. Moreover, in contrast to HPV, re- 
current "hotspots" of HBV integration in the host genome (e.g., at 
TERT) were observed. It will be interesting to determine whether 
other oncogenic viruses also are associated with similar genomic 
alterations. Our findings support an emerging hypothesis whereby 
the aberrant firing of origins of replication could contribute plau- 
sibly to genomic instability in human cancers in the absence of 
viral infection (Evertts and Coller 2012). Studies are ongoing to 
extend this looping model to additional HPV-associated primary 
tumors (e.g., anogenital cancers), and to evaluate the significance 
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of these genomic alterations for HPV-associated cancer pro- 
gression, targeted therapeutics, and patient outcomes. 

Methods 

Human cancer cell lines and primary tumors 

SiHa (Friedl et al. 1970), CaSki (Pattillo et al. 1977), CAL 27 
(Gioanni et al. 1988), D562 (Peterson et al. 1971), and SCC-25 
(Rheinwald and Beckett 1981) were acquired from the American 
Type Culture Collection (Manassas, VA). UM-SCC-47 (Lansford 
et al. 1999) and UM-SCC-104 (Tang et al. 2012) were kindly pro- 
vided by Thomas Carey, University of Michigan; UPCI:SCC090 
(White et al. 2007) were from Susanne M. Gollin, University of 
Pittsburgh; UD-SCC-2 (Ballo et al. 1999) were from Henning Bier, 
University of Dusseldorf; and HMS001 cells were from James 
Rocco, Harvard Medical School. Please see Table 1 and Supple- 
mental Table 1. 

CAL 27, D562, SCC-25, SiHa, and CaSki cells were cultured as 
recommended by ATCC. UM-SCC-104, UM-SCC-47, UD-SCC-2, 
and UPCI:SCC090 cells were cultured in Dulbecco's modified 
Eagle's medium (DMEM; Gibco) containing 2 mM L-glutamine, 
1% nonessential amino acids, 100 units/mL penicillin, 100 |xg/mL 
streptomycin, and 10% fetal bovine serum (FBS; Gibco). HMS001 
cells were cultured in DMEM:F12 (Gibco) medium supple- 
mented with 10% FBS and lx penicillin/streptomycin/glutamine 
(Mediatech). All cell lines were grown at 37°C in a humidified at- 
mosphere containing 5% C0 2 . Cells were harvested at 90% con- 
fluence. Genomic DNA was extracted using a Puregene DNA puri- 
fication kit (Gentra Systems). RNA was extracted in TRIzol (Life 
Technologies). 

De-identified, fresh-frozen HNSCC samples were obtained 
from a tumor bank at The Ohio State University. Specimen banking 
was approved by an Institutional Review Board for research pur- 
poses, and written consent was obtained from patients. OCT- 
embedded frozen sections were stained with hemotoxylin and eosin 
to guide microdissection, ensuring >80% tumor. Please see Table 1 
and Supplemental Table 1. Genomic DNA was purified by pro- 
teinase K digestion, phenol/chloroform/isoamyl alcohol extraction, 
and ethanol precipitation, and was quantified using Quant-iT 
PicoGreen dsDNA reagent (Invitrogen). RNA from primary tu- 
mors was extracted using the Ambion RNAqueous-4PCR kit. 

Whole-genome sequencing and analysis 

Whole-genome sequencing (WGS, 2 X 100-bp paired-end reads) 
was performed using the HiSeq 2000 platform via the Illumina 
Genome Network (IGN; http://www.illumina.com). Raw sequence 
reads were aligned to the human genome reference assembly 
(hgl9) using IGN ELAND. Small variants such as SNPs and small 
indels were detected by IGN CASAVA. Those variants previously 
observed in the 1000 Genomes Project Consortium (release phase 
1) (The 1000 Genomes Project Consortium 2010) and present in 
the dbSNP database at minor allele frequency >1% were excluded 
(Sherry et al. 2001). Predicted functional consequences of somatic 
variants, annotated using variant_effect_predictor (McLaren et al. 
2010) as per the Ensembl release 70 database, were classified into 
four tiers of predicted functional significance (tierl: variants with 
coding change and essential splice sites; tier2: conserved or regu- 
latory region; tier3: non-repeat region; tier4: other). 

To detect variants in both human and HPV genomes, we 
created a proxy reference assembly combining the human (hgl9) 
and HPV genomes (called hgl9+HPV; GenBank accession num- 
bers: HPV16 [NC_001526.2]; HPV18 [NC_001357.1]; HPV31 
[HQ537687.1]; HPV33 [HQ537707.1]; and HPV35 [M74117.1]). 



We used BWA (Li and Durbin 2010) to align WGS sequence reads 
against hgl9+HPV. Mpileup function from SAMtools (Li et al. 
2009) was used to detect SNPs and small INDELs in HPV genomes. 
HPV variants were classified by comparison of sequences to HPV16 
lineage-specific SNPs as reported by the IARC HPV Variant Study 
Group (Cornet et al. 2012). 

To estimate HPV copy numbers per sample, mean viral ge- 
nome coverage was divided by autosomal coverage (Table 1). A 
modified Hydra workflow (Quinlan and Hall 2010) was used to 
detect structural variants. Discordant read pairs were extracted by 
alignments against the reference assembly using GSNAP (version 
2012-07-20) (Wu and Nacu 2010). To detect copy number alter- 
ations, CNAnorm (Gusnanto et al. 2012) was used to evaluate read 
depths of WGS data in 50-kb windows for each sample. Combined 
read depth data from 20 normal individuals of both genders were 
used as a control. For those samples with corresponding cytoge- 
netic data, baselines for copy number detection were adjusted to 
average ploidy. 

Alignments supporting candidate chromosomal transloca- 
tions and HPV breakpoints were further confirmed by BLAT (Kent 
2002). We removed SV candidates with less than four supporting 
read pairs. SVs were categorized into known and novel variants by 
comparison with the Database of Genomic Variants, DGV (Iafrate 
et al. 2004). De novo assembly was performed using Velvet to 
identify structural variants (SV) and junctions (Zerbino et al. 2009). 

To visualize sequence alignments, we used the Broad In- 
stitute's Integrative Genome Viewer (IGV) (Thorvaldsdottir et al. 
2013). Circos plots (Krzywinski et al. 2009) were generated to vi- 
sualize variant distributions genome- wide. 

PCR amplification and Sanger sequencing 

To confirm insertional breakpoints, perform chromosome walk- 
ing, and model resolved linear structures of the host cell genome, 
we designed custom primers to amplify regions of interest by PCR, 
using standard methods (e.g., see blue bars in "evidence tracks" in 
Figs. 4-6). PCR products were sequenced directly using conven- 
tional Sanger sequencing. If unsuccessful, PCR products were 
cloned using the TOPO TA cloning kit (Life Technologies) and se- 
quenced from both directions using the M13F forward and M13 
reverse primers at the OSUCCC Nucleic Acids Core Facility or using 
a commercial sequencing service (Genewiz). 

Analysis of the relationship between HPV breakpoints, human 
RefSeq genes, and structural variation 

Genes neighboring HPV breakpoints were identified using the 
NCBI RefSeq database. We examined the overlap between HPV 
breakpoints and reported fragile sites within the human genome 
(n = 104) in the NCBI Gene database (mean size = 7.5 Mb, range 
0.7-28 Mb) and for DNase I hypersensitivity sites using 51 cell 
types from the UCSC Genome Browser ENCODE (The ENCODE 
Project Consortium 2011). In these analyses, we compared our 
observed distribution of HPV breakpoints to a simulated set of 1 
million random breakpoints (using randomly chosen human ge- 
nomic coordinates) as a control. Comparisons were performed 
using a binomial test. 

The relationship between HPV breakpoints and predicted DNA 
copy number change was evaluated using CNAnorm (Gusnanto 
et al. 2012). For this analysis, the human genome was divided into 
50-kb bins. Hyperamplification was defined as regions displaying 
copy numbers (ploidy) of N > 8, and breakpoint regions were de- 
fined as within ±250 kb from an HPV breakpoint. Associations 
were evaluated using a binomial test. P-values were adjusted by 
Bonferroni multiple testing correction. A similar analysis was 
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conducted to evaluate associations between HPV breakpoints and 
detected structural variants (chromosomal translocation, de- 
letion, inversion, intrachromosomal translocation) using Hy- 
dra. All statistical analyses (including those described above) 
were performed using the R/Bioconductor packages (http:// 
www.bioconductor.org). 

FISH [fluorescence in situ hybridization) and spectral 
karyotyping [SKY] analysis 

To evaluate chromosomal structures and counts in the various cell 
lines, we used spectral karyotyping (SKY) and fluorescence in situ 
hybridizations (FISH) (Schrock et al. 1996). Protocols can be ac- 
cessed at http://www.riedlab.nci.nih.gov/index.php/protocols. In 
brief, metaphase chromosome suspensions were prepared first by 
treating cells with a hypotonic solution (0.075 M KC1), followed by 
fixation using methanol: acetic acid (3:1 [v/v]) and then dropping 
them onto slides, using a Thermotron chamber to control hu- 
midity. Slides were aged for ~1 wk at 37°C prior to hybridization. 
Chromosome preparations were hybridized for 72 h with SKY 
probes (prepared in-house). 

SKY analysis of samples was performed using a Leica DMRXE 
microscope (Leica), DAPI and SKY filters (Chroma), a Xenon lamp, 
and a Spectracube (Applied Spectral Imaging). Spectrum-based 
classification and analysis of the fluorescent SKY images were 
achieved using SkyView software (Applied Spectral Imaging). 

For FISH analysis, slides were evaluated using a Leica DMRXE 
microscope, fitted with a mercury lamp and a CCD camera. Leica 
QFISH software was used to acquire cell images. 

For each cell line, —15-20 metaphase spreads were acquired 
for SKY and FISH analysis and scored for numerical and structural 
chromosomal aberrations according to established human chro- 
mosome nomenclature rules from ISCN (International Standing 
Committee on Human Cytogenetic Nomenclature 2009). 

RNA sequencing (RNA-seq) 

One microgram of total RNA was used for library preparation. Ei- 
ther 51-bp paired-end or 250-bp paired-end libraries were con- 
structed from 1 |xg of total RNA samples, using Illumina's stranded 
and unstranded library protocols. A TruSeq RNA sample prepara- 
tion kit (version 2; Illumina) was used to construct strand-specific 
RNA-seq libraries. Library construction included fragmentation, 
first-strand cDNA synthesis, second-strand cDNA synthesis, end 
repairing, 3 '-end adenylation, adaptor ligation, and PCR amplifi- 
cation with barcoded oligos. Briefly, total RNA was fragmented by 
heating. First-strand cDNA synthesis was performed (Life Tech- 
nologies). Reaction products were precipitated with ethanol and 
ammonium acetate, incubated for 2 h at -80°C and centrifuged at 
20,000# for 30 min at 4°C. Pellets were washed with ice-cold 70% 
ethanol, air-dried for 10 min, and then resuspended in second- 
strand synthesis buffer (New England Biolabs). After mixing, 
samples were incubated for 2.5 h at 16°C for second-strand syn- 
thesis. To preserve strand information in certain libraries, we added 
dUTP to the second-strand synthesis reaction. Samples were then 
cleaned using Agencourt AMPure XP beads (Beckman Coulter), ends 
were repaired, and 3' adenylation was performed. Adaptor ligation 
was performed as per the kit manual. Then, in some cases in which 
strand-specific information was to be preserved, prior to PCR ampli- 
fication, the incorporated uracils in the coding strand were degraded 
to eliminate the second strand of cDNA by adding USER enzyme 
(New England Biolabs) for 1 h at 37°C before being placed on ice. 

The resulting libraries were amplified by PCR and purified 
using Agencourt AMPure XP beads (Beckman Coulter). Library 
quality and concentrations were measured using high-sensitivity 



chips on an Agilent 2100 Bioanalyzer and a Qubit 2.0 (Life Tech- 
nologies). Based on measured concentrations, barcoded RNA-seq 
libraries were pooled and loaded onto an Illumina MiSeq in- 
strument (Genomics Facility, Cincinnati Children's Hospital and 
Medical Center). 

We aligned 51-bp paired-end reads against our hgl9+HPV ref- 
erence assembly using TopHat (version 1.4.1) (Trapnell et al. 2009), 
and 250-bp paired-end reads were aligned using GSNAP (Wu and 
Nacu 2010). Expression levels for known transcripts were de- 
termined (Ensembl release 67 human transcripts) using Cufflinks 
(version 1.3.0) (Trapnell et al. 2012). To detect HPV fusion tran- 
scripts, we used GSNAP to align reads and Hydra to detect chimeric 
HPV-host read pairs (Quinlan and Hall 2010). We used Cummer- 
bund to visualize gene expression levels (Trapnell et al. 2012). 

Rapid amplification of cDNA ends [RACE) 

RNA ligase-mediated RACE analysis was conducted using the 
Ambion FirstChoice RLM-RACE kit (Life Technologies). HPV or gene- 
specific forward and reverse primers were used to amplify 5' and 3' 
cDNA ends (Klaes et al. 1999). Resulting PCR products were cloned 
using the TOPO TA cloning kit (Life Technologies) and se- 
quenced in both directions using the M13F forward and Ml 3 
reverse primers. 

Additional methods 

Quantitative PCR (qPCR) assays were performed to quantify focal 
amplification at genomic regions detected by WGS (Supplemental 
Fig. 8). Northern blot hybridization was used to assay transcripts 
from TP63, DIAPH2, HPV 16 E7, HPV16 E5, and RPLP0, a loading 
control (Fig. 7). Western blotting was performed to assess expres- 
sion of TP63 and diaphanous-related formin 2 proteins (Fig. 7). 
Additional details for quantitative (real-time) PCR, Northern blot 
hybridization, Western blot hybridization, and peptide competi- 
tion experiments are provided in the Supplemental Methods. 

Data access 

WGS data have been deposited at the European Genome-phenome 
Archive (EGA; http://www.ebi.ac.uk/ega/), hosted by the European 
Bioinformatics Institute (EBI), European Molecular Biology Labo- 
ratory (EMBL), under accession number EGAS00001000599. 
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